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Abstract 

One of the problems in part-of-speech 
tagging of real-word texts is that of 
unknown to th e lexicon words. In 
( Mikheev, 1996 ), a technique for fully 
unsupervised statistical acquisition of 
rules which guess possible parts-of- 
speech for unknown words was proposed. 
One of the over-simplification assumed 
by this learning technique was the acqui- 
sition of morphological rules which obey 
only simple concatenative regularities of 
the main word with an affix. In this pa- 
per we extend this technique to the non- 
concatenative cases of suffixation and as- 
sess the gain in the performance. 



1 Introduction 

Part-of-speech (pos) taggers are programs which 
assign a single POS-tag to a word-token, provided 
that it is known what parts-of-speech this word 
can take on in principle. In order to do that tag- 
gers are supplied with a lexicon that lists possible 
POS-tags for words which were seen at the training 
phase. Naturally, when tagging real-word texts, 
one can expect to encounter words which were 
not seen at the training phase and hence not in- 
cluded into the lexicon. This is where word-POS 
guessers take their place - they employ the analy- 
sis of word features, e.g. word leading and trailing 
characters to figure out its possible POS categories. 
Currently, most of the taggers are supplied with 
a word-guessing component for dealing with un- 
known words. The most popular guessing strat- 
egy is so-called "ending guessing" when a possible 
set of POS-tags for a word is guessed solely on the 
basis of its trailing characters. An example of such 
gues ser is the gues ser supplied with the Xerox tag- 
ger (Kupiec, 1992). A similar approach was taken 
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in ( Weischedel et al., 1993| ) where an unknown 
word was guessed given the probabilities for an 
unknown word to be of a particular P OS, its cap - 
italisation feature and its ending. In flBrill, 1995 ) 
a system of rules which uses both ending-guessing 
and more morphologically motivated rules is de- 
scribed. Best of these methods were reported to 
achieve 82-8 5% of tagg i ng accuracy on unknow n 
words, e.g. prill, 19951 |Weischedel et al., 1993| ). 

In ( Mikheev, 1996 ) a cascading word-POS 
guesser is described. It applies first morpho- 
logical prefix and suffix guessing rules and then 
ending-guessing rules. This guesser is reported to 
achieve higher guessing accuracy than quoted be- 
fore which in average was about by 8-9% better 
than that of the Xerox guesser and by 6-7% bet- 
ter than that of Brill's guesser, reaching 87-92% 
tagging accuracy on unknown words. 

There are two kinds of word-guessing rules em- 
ployed by the cascading guesser: morphological 
rules and ending guessing rules. Morphological 
word-guessing rules describe how one word can be 
guessed given that another word is known. In En- 
glish, as in many other languages, morphological 
word formation is realised by affixation: prefixa- 
tion and suffixation, so there are two kinds of mor- 
phological rules: suffix rules (A s ) — rules which 
are applied to the tail of a word, and prefix rules 
(A p ) — rules which are applied to the beginning 
of a word. For example, the prefix rule: 

A p : [un (VBD VBN) (JJ)] 

says that if segmenting the prefix "un" from an 
unknown word results in a word which is found 
in the lexicon as a past verb and participle (vbd 
vbn), we conclude that the unknown word is an 
adjective (JJ). This rule works, for instance, for 

WOrds [developed — undeveloped] . An example of a 

suffix rule is: 

A s : [ed (NN VB) (JJ VBD VBN)] 

This rule says that if by stripping the suffix "ed" 
from an unknown word we produce a word with 
the POS-class noun/verb (nn vb), the unknown 
word is of the class adjective/past-verb/participle 
(jj vbd vbn). This rule works, for instance, for 



WOrd pairs [book ^booked], [water -^watered], etc. 

Unlike morphological guessing rules, ending- 
guessing rules do not require the main form of an 
unknown word to be listed in the lexicon. These 
rules guess a POS-class for a word just on the ba- 
sis of its ending characters and without looking up 
its stem in the lexicon. For example, an ending- 
guessing rule 

A e : [ing — (JJ NN VBG)] 

says that if a word ends with "ing" it can 
be an adjective, a noun or a gerund. Unlike 
a morphological rule, this rule does not ask to 
check whether the substring preceeding the "ing"- 
ending is a word with a particular POS-tag. 

Not surprisingly, morphological guessing rules 
are more accurate than ending-guessing rules but 
their lexical coverage is more restricted, i.e. they 
are able to cover less unknown words. Since they 
are more accurate, in the cascading guesser they 
were applied before the ending-guessing rules and 
improved the precision of the guessings by about 
5%. This, actually, resulted in about 2% higher 
accuracy of tagging on unknown words. 

Although in general the performance of the cas- 
cading guesser was detected to be only 6% worse 
than a general-language lexicon lookup, one of the 
over-simplifications assumed at the extraction of 
the morphological rules was that they obey only 
simple concatenative regularities: 

book — >book-)-ed; take — >take-|-n; play — >play-)-ing. 

No attempts were made to model non- 
concatenative cases which are quite common in 
English, as for instance: 

try — s-tries; reduce — ^reducing; advise — ^advisable. 

So we thought that the incorporation of a set of 
guessing rules which can capture morphological 
word dependencies with letter alterations should 
extend the lexical coverage of the morphological 
rules and hence might contribute to the overall 
guessing accuracy. 

In the rest of the paper first, we will briefly 
outline the unsuper vised statistica l learning tech- 
nique proposed in ( Mikheev, 1996| ), then we pro- 
pose a modification which will allow for the incor- 
poration of the learning of non-concatenative mor- 
phological rules, and finally, we will evaluate and 
assess the contribution of the non-concatenative 
suffix morphological rules to the overall tagging 
accuracy on unknown words using the cascading 
guesser. 

2 The Learning Paradigm 

The major topic in the development of word- 
POS guessers is the strategy which is to be 
used for the ac quisition of the guessing rules. 
Brill ( Brill, 1995 ) outlines a transformation-based 
learner which learns guessing rules from a pre- 
tagged training corpus. A statistical-based suffix 



learner is presented in (Schmid, 1994). From a 
pre-tagged training corpus it constructs the suf- 
fix tree where every suffix is associated with its 
information measure. 

The learning technique employed in the in- 
duction of the rules of the cascading guesser 
( jMikhccv, 1996 ) docs not require specially pre- 
pared training data and employs fully unsuper- 
vised statistical learning from the lexicon supplied 
with the tagger and word-frequencies obtained 
from a raw corpus. The learning is implemented 
as a two-staged process with feedback. First, set- 
ting certain parameters a set of guessing rules is 
acquired, then it is evaluated and the results of 
evaluation are used for re-acquisition of a better 
tuned rule-set. As it has been already said, this 
learning technique proved to be very successful, 
but did not attempt at the acquisition of word- 
guessing rules which do not obey simple concate- 
nations of a main word with some prefix. Here we 
present an extension to accommodate such cases. 

2.1 Rule Extraction Phase 



In the initial learning technique ( |Mikhccv, 1996| ) 
which accounted only for simple concatenative 
regularities a guessing rule was seen as a triple: 
A = (S, 1, R) where 
S is the affix itself; 

/ is the POS-class of words which should be 
looked up in the lexicon as main forms; 

R is the POS-class which is assigned to unknown 
words if the rule is satisfied. 

Here we extend this structure to handle cases of 
the mutation in the last n letters of the main word 
(words of /-class), as, for instance, in the case of 
try ^tries, when the letter "y" is changed to "i" be- 
fore the suffix. To accommodate such alterations 
we included an additional mutation element (M) 
into the rule structure. This element keeps the 
segment to be added to the main word. So the 
application of a guessing rule can be described as: 

unknown-word - S + M : I —iR 
i.e. from an unknown word we strip the affix S 1 , 
add the mutative segment M, lookup the pro- 
duced string in the lexicon and if it is of class 
I we conclude that the unknown word is of class 
R. For example: the suffix rule A s : 

[ S= ied 1= (NN, VB) R= (JJ VBD VBN) M=y] 
Or in short [ied (NN VB) (JJ VBD VBN) y] 

says that if there is an unknown word which ends 
with "ied" , we should strip this ending and ap- 
pend to the remaining part the string "y" . If 
then we find this word in the lexicon as (nn vb) 
(noun/verb), we conclude that the guessed word is 
of category (jj vbd vbn) (adjective, past verb or 
participle). This rule, for example, will work for 

WOrd pairs like specify - specified Or deny - denied. 

Next, we modified the y operator which was 



used for the extraction of morphological guessing wc c alculate its score as explained in (Mikheev, 



rules. We augmented this operator with the index 
n which specifies the length of the mutative end- 
ing of the main word. Thus when the index n is 
the result of the application of the Vo operator 
will be a morphological rule without alterations. 
The Vi operator will extract the rules with the 
alterations in the last letter of the main word, as 
in the example above. The y operator is applied 
to a pair of words from the lexicon. First it seg- 
ments the last n characters of the shorter word 
and stores this in the M element of the rule. Then 
it tries to segment an affix by subtracting the 
shorter word without the mutative ending from 
the longer word. If the subtraction results in an 
non-empty string it creates a morphological rule 
by storing the POS-class of the shorter word as the 
/-class, the POS-class of the longer word as the R- 
class and the segmented affix itself. For example: 

[booked (JJ VBD VBN)] Vo [book (NN VB)] — > 

A s : [ed (NN VB) (JJ VBD VBN) ""] 
[advisable (JJ VBD VBN)] SJ\ [advise (NN VB)] — > 
A s : [able (NN VB) (JJ VBD VBN) "e"] 

The V operator is applied to all possible 
lexicon-entry pairs and if a rule produced by such 
an application has already been extracted from 
another pair, its frequency count (/) is incre- 
mented. Thus sets of morphological guessing rules 
together with their calculated frequencies are pro- 
duced. Next, from these sets of guessing rules 
we need to cut out infrequent rules which might 
bias the further learning process. To do that we 
eliminate all the rules with the frequency / less 
than a certain threshold 60. Such filtering reduces 
the rule-sets more than tenfold and does not leave 
clearly coincidental cases among the rules. 

2.2 Rule Scoring Phase 

Of course, not all acquired rules are equally good 
as plausible guesses about word-classes. So, for 
every acquired rule we need to estimate whether 
it is an effective rule which is worth retaining in 
the final rule-set. To perform such estimation 
we take one-by-one each rule from the rule-sets 
produced at the rule extraction phase, take each 
word-token from the corpus and guess its POS-set 
using the rule if the rule is applicable to the word. 
For example, if a guessing rule strips a particular 
suffix and a current word from the corpus does not 
have such suffix we classify these word and rule 
as incompatible and the rule as not applicable to 
that word. If the rule is applicable to the word we 
perform lookup in the lexicon and then compare 
the result of the guess with the information listed 
in the lexicon. If the guessed POS-set is the same 
as the POS-set stated in the lexicon, we count it as 
success, otherwise it is failure. Then for each rule 



1996 ) using the scoring function as follows: 
score, =p i - 1.65 * + log ( | ^ | ) ) 

where p is the proportion of all positive out- 
comes (x) of the rule application to the total num- 
ber of compatible to the rule words (n), and \S\ 
is the length of the affix. We also smooth p so as 
not to have zeros in positive or negative outcome 
probabilities: p — ■ 

Setting the threshold 8 S at a certain level lets 
only the rules whose score is higher than the 
threshold to be included into the final rule-sets. 
The method for setting up the threshold is based 
on empirical evalu atio ns of the rule-sets and is de- 
scribed in Section 2.2. 



1 usually we set this threshold quite low: 2-4. 



2.3 Setting the Threshold 

The task of assigning a set of POS-tags to a par- 
ticular word is actually quite similar to the task 
of document categorisation where a document 
should be assigned with a set of descriptors which 
represent its contents. The performance of such 
assignment can be measured in: 

recall - the percentage of POS-tags which the 
guesser assigned correctly to a word; 

precision - the percentage of POS-tags the 
guesser assigned correctly over the total number 
of POS-tags it assigned to the word; 

coverage - the proportion of words which the 
guesser was able to classify, but not necessarily 
correctly. 

There are two types of test-data in use at this 
stage. First, we measure the performance of a 
guessing rule-set against the actual lexicon: ev- 
ery word from the lexicon, except for closed-class 
words and words shorter than five characters, is 
guessed by the rule-sets and the results are com- 
pared with the information the word has in the 
lexicon. In the second experiment wc measure 
the performance of the guessing rule-sets against 
the training corpus. For every word we mea- 
sure its metrics exactly as in the previous exper- 
iment. Then we multiply these measures by the 
corpus frequency of this particular word and av- 
erage them. Thus the most frequent words have 
the greatest influence on the final measures. 

To extract the best-scoring rule-sets for each ac- 
quired set of rules we produce several final rule- 
sets setting the threshold 9 S at different values. 
For each produced rule-set we record the three 
metrics (precision, recall and coverage) and choose 
the sets with the best aggregate measures. 

3 Learning Experiment 

One of the most important issues in the induction 
of guessing rule-sets is the choice of right data for 
training. In our approach, guessing rules are ex- 



n — T~* — 77737 — : 

Guessing 
Strategy 


i r~77 — "7777 

Lexicon 

Precision Recall Coverage 


1 r*T77 7, 1 

Corpus 

Precision Recall Coverage 


Suffix (Seo) 

Suffix with alt. (Ago) 

Sgo~t~ Ago 

Ago+ Seo 


0.920476 0.959087 0.373851 
0.964433 0.97194 0.193404 
925782 959568 4495 
0.928376 0.959457 0.4495 


0.978246 0.973537 0.29785 
0.996292 0.991106 0.187478 
981375 977098 370538 
0.981844 0.977165 0.370538 


Ending (E 75 ) 

S60+ E75 

S6o+Ago+ E 75 
Ago+S6o+ E75 


0.666328 0.94023 0.97741 
0.728449 0.941157 0.9789471 
0.739347 0.941548 0.979181 
0.740538 0.941497 0.979181 


0.755653 0.951342 0.958852 
0.798186 0.947714 0.961047 
0.805789 0.948022 0.961047 
0.805965 0.948051 0.961047 



Table 1: Results of the cascading application of the rule-sets over the training lexicon and training 
corpus. Ago - suffixes with alterations scored over 80 points, Sgo - suffixes without alterations scored 
over 60 points, E75 - ending-guessing rule-set scored over 75 points. 



tracted from the lexicon and the actual corpus fre- 
quencies of word-usage then allow for discrimina- 
tion between rules which are no longer productive 
(but have left their imprint on the basic lexicon) 
and rules that are productive in real-life texts. 
Thus the major factor in the learning process is 
the lexicon - it should be as general as possible 
(list all possible POSs for a word) and as large as 
possible, since guessing rules are meant to capture 
general language regularities. The corresponding 
corpus should include most of the words from the 
lexicon and be large enough to obtain reliable es- 
timates of word-frequency distribution. 

We performed a rule-induction experiment us- 
ing the lexicon and w ord-frequencies derive d 
from the Brown Corpus ( Francis&Kucera, 1982). 
There are a number of reasons for choosing the 
Brown Corpus data for training. The most im- 
portant ones are that the Brown Corpus provides 
a model of general multi-domain language use, 
so general language regularities can be induced 
from it, and second, many taggers come with data 
trained on the Brown Corpus which is useful for 
comparison and evaluation. This, however, by no 
means restricts the described technique to that or 
any other tag-set, lexicon or corpus. Moreover, 
despite the fact that the training is performed 
on a particular lexicon and a particular corpus, 
the obtained guessing rules suppose to be domain 
and corpus independent and the only training- 
dependent feature is the tag-set in use. 

Using the technique described above and the 
lexicon derived from the Brown Corpus we ex- 
tracted prefix morphological rules (no alter- 
ations), suffix morphological rules without alter- 
ations a nd ending guessi ng rules, exactly as it was 
done in (Mikheev, 1996). Then we extracted suf- 
fix morphological rules with alterations in the last 
letter (Vi)> which was a new rule-set for the cas- 
cading guesser. Quite interestingly apart from the 
expected suffix rules with alterations as: 

[ S= ied 1= (NN, VB) R= (33 VBD VBN) M=y] 



which can handle pairs like deny — ^denied, this 
rule-set was populated with "second-order" rules 
which describe dependencies between secondary 
forms of words. For instance, the rule 

[ S= ion 1= (NNS VBZ) R= (NN) M=s] 

says if by deleting the suffix "ion" from a word 
and adding "s" to the end of the result of this 
deletion we produce a word which is listed in the 
lexicon as a plural noun and 3-rd form of a verb 
(nns vbz) the unknown word is a noun (nn). 
This rule, for instance, is applicable to word pairs: 

affects — ^affection, asserts — ^assertion, etc. 

Table 1 presents some results of a comparative 
study of the cascading application of the new rule- 
set against the standard rule-sets of the cascading 
guesser. The first part of Table 1 shows the best 
obtained scores for the standard suffix rules (S) 
and suffix rules with alterations in the last let- 
ter (A). When we applied the two suffix rule-sets 
cascadingly their joint lexical coverage increased 
by about 7-8% (from 37% to 45% on the lexicon 
and from 30% to 37% on the corpus) while pre- 
cision and recall remained at the same high level. 
This was quite an encouraging result which, ac- 
tually, agreed with our prediction. Then we mea- 
sured whether suffix rules with alterations (A) add 
any improvement if they are used in conjunction 
with the ending-guessing rules. Like in the previ- 
ous experiment we measured the precision, recall 
and coverage both on the lexicon and on the cor- 
pus. The second part of Table 1 shows that sim- 
ple concatenative suffix rules (Seo) improved the 
precision of the guessing when they were applied 
before the ending-guessing rules (E75) by about 
5%. Then we cascadingly applied the suffix rules 
with alterations (Ago) which caused further im- 
provement in precision by about 1%. 

After obtaining the optimal rule-sets we per- 
formed the same experiments on a word-sample 
which was not included into the training lexicon 
and corpus. We gathered about three thousand 
words from the lexicon developed for the Wall 



Lexicon 


Guessing 




Total 


Unkn. 


Total 


Unkn. 


Total 


Unkn. 




strategy 




words 


words 


mistag. 


mistag. 


Score 


Score 


Full 


standard: 


P+S+E 


5,970 


347 


292 


33 


95.1% 


90.5% 


Full 


with new: 


P+A+S+E 


5,970 


347 


292 


33 


95.1% 


90.5% 


Small 


standard: 


P+S+E 


5,970 


2,215 


332 


309 


94.44% 


86.05% 


Small 


with new: 


P+A+S+E 


5,970 


2,215 


311 


288 


94.79% 


87.00% 



Table 2: Results of tagging a text using the standard Prefix+Sufhx+Ending cascading guesser and the 
guesser with the additional rule-set of suffixes-with-Alterations. For each of these cascading guessers 
two tagging experiments were performed: the tagger was equipped with the full Brown Corpus lexicon 
and with the small lexicon of closed-class and short words (5,465 entries). 



Street Journal corpus^ and collected frequencies 
of these words in this corpus. At this test-sample 
evaluation we obtained similar metrics apart from 
the coverage which dropped by about 7% for both 
kinds of suffix rules. This, actually, did not come 
as a surprise, since many main forms required by 
the suffix rules were missing in the lexicon. 

4 Evaluation 

The direct performance measures of the rule-sets 
gave us the grounds for the comparison and se- 
lection of the best performing guessing rule-sets. 
The task of unknown word guessing is, however, a 
subtask of the overall part-of-speech tagging pro- 
cess. Thus we are mostly interested in how the 
advantage of one rule-set over another will affect 
the tagging performance. So, we performed an in- 
dependent evaluation of the impact of the word 
guessing sets on tagging accuracy. In this evalu- 
ation we used the cascading application of prefix 
rules, suffi x rules and end ing-guessing rules as de- 
scribed in ( Mikheev, 1996 ). We measured whether 
the addition of the suffix rules with alterations 
increases the accuracy of tagging in comparison 
with the standard rule-sets. In this experiment we 
used a tagger which was a CH — h re- implementation 
of the lisp implemented HMM Xerox tagger de- 
scribed in ( Kupiec, 1992 ) trained on the Brown 
Corpus. For words which failed to be guessed by 
the guessing rules we applied the standard method 
of classifying them as common nouns (NN) if they 
are not capitalised inside a sentence and proper 
nouns (NP) otherwise. 

In the evaluation of tagging accuracy on un- 
known words we payed attention to two metrics. 
First we measure the accuracy of tagging solely 
on unknown words: 

rr 7 r> CorrectlyTaqqedU nkownW ords 

U nkownbcore — ^ r, TT J , rr — i 

TotalUnknownW ords 

This metric gives us the exact measure of how 
the tagger has done when equipped with different 
guessing rule-sets. In this case, however, we do 
not account for the known words which were mis- 
tagged because of the unknown ones. To put a 



perspective on that aspect we measure the overall 
tagging performance: 

TotalSmre = CorrectlyTaggedWords 
TotalW ords 

To perform such evaluation we tagged several 
texts of different origins, except ones from the 
Brown Corpus. These texts were not seen at the 
training phase which means that neither the tag- 
ger nor the guesser had been trained on these texts 
and they naturally had words unknown to the lex- 
icon. For each text we performed two tagging ex- 
periments. In the first experiment we tagged the 
text with the full-fledged Brown Corpus lexicon 
and hence had only those unknown words which 
naturally occur in this text. In the second ex- 
periment we tagged the same text with the lexi- 
con which contained only closed-class^ and short^J 
words. This small lexicon contained only 5,456 
entries out of 53,015 entries of the original Brown 
Corpus lexicon. All other words were considered 
as unknown and had to be guessed by the guesser. 
In both experiments we measured tagging accu- 
racy when tagging with the guesser equipped with 
the standard Prefix+Suffix+Ending rule-sets and 
with the additional rule-set of suffixes with alter- 
ations in the last letter. 

Table 2 presents some results of a typical ex- 
ample of such experiments. There we tagged a 
text of 5,970 words. This text was detected to 
have 347 unknown to the Brown Corpus lexicon 
words and as it can be seen the additional rule- 
set did not cause any improvement to the tagging 
accuracy. Then we tagged the same text using 
the small lexicon. Out of 5,970 words of the text, 
2,215 were unknown to the small lexicon. Here 
we noticed that the additional rule-set improved 
the tagging accuracy on unknown words for about 
1%: there were 21 more word-tokens tagged cor- 
rectly because of the additional rule-set. Among 
these words were: "classified" , "applied" , "tries" , 
"tried", "merging", "subjective", etc. 



these words were not listed in the training lexicon 



articles, prepositions, conjunctions, etc. 
4 shorter than 5 characters 



5 Discussion and Conclusion 

The target of the research reported in this paper 
was to incorporate the learning of morphological 
word-POS guessing rules which do not obey simple 
concatenations of main words with affixes into the 



learning paradigm proposed in (Mikheev, 1996). 
To do that we extended the data structures and 
the algorithms for the guessing-rule application to 
handle the mutations in the last n letters of the 
main words. Thus simple concatenative rules nat- 
urally became a subset of the mutative rules - they 
can be seen as mutative rules with the zero muta- 
tion, i.e. when the M element of the rule is empty. 
Simple concatenative rules, however, are not nec- 
essarily regular morphological rules and quite of- 
ten they capture other non-linear morphological 
dependencies. For instance, consonant doubling is 
naturally captured by the affixes themselves and 
obey simple concatenations, as, for example, de- 
scribes the suffix rule A s : 

[ S= ging 7= (NN VB) R= (JJ NN VBG) M=""] 

rule, for example, will work for word pairs like 
tag - tagging or dig - digging. Note that here 
we don't specify the prerequisites for the stem- 
word to have one syllable and end with the same 
consonant as in the beginning of the affix. Our 
task here is not to provide a precise morpholog- 
ical description of English but rather to support 
computationally effective POS-guessings, by em- 
ploying some morphological information. So, in- 
stead of using a proper morphological processor, 
we adopted an engineering approach whi ch is ar- 
gued for in ( MikhccvfcLiubushkina, 1995 ) . There 
is, of course, nothing wrong with morphological 
processors perse, but it is hardly feasible to re- 
train them fully automatically for new tag-sets 
or to induce new rules. Our shallow technique 
on the contrary allows to induce such rules com- 
pletely automatically and ensure that these rules 
will have enough discriminative features for robust 
guessings. In fact, we abandoned the notion of 
morpheme and are dealing with word segments re- 
gardless of whether they are "proper" morphemes 
or not. So, for example, in the rule above "ging" 
is considered as a suffix which in principle is not 
right: the suffix is "ing" and "g" is the dubbed 
consonant. Clearly, such nuances are impossible 
to learn automatically without specially prepared 
training data, which is denied by the technique 
in use. On the other hand it is not clear that 
this fine-grained information will contribute to the 
task of morphological guessing. The simplicity of 
the proposed shallow morphology, however, en- 
sures fully automatic acquisition of such rules and 
the empirical evaluation presented in section 2.3 
confirmed that they are just right for the task: 
precision and recall of such rules were measured 
in the range of 96-99%. 



The other aim of the research reported here 
was to assess whether non-concatenative morpho- 
logical rules will improve the overall performance 
of the cascading guesser. As it was measured in 
( Mikheev, 1996 ) simple concatenative prefix and 
suffix morphological rules improved the overall 
precision of the cascading guesser by about 5%, 
which resulted in 2% higher accuracy of tagging 
on unknown words. The additional rule-set of suf- 
fix rules with one letter mutation caused some 
further improvement. The precision of the guess- 
ing increased by about 1% and the tagging ac- 
curacy on a very large set of unknown words in- 
creased by about 1%. In conclusion we can say 
that although the ending-guessing rules, which 
are much simpler than morphological rules, can 
handle words with affixes longer than two charac- 
ters almost equally well, in the framework of POS- 
tagging even a fraction of percent is an important 
improvement. Therefore the contribution of the 
morphological rules is valuable and necessary for 
the robust POS-tagging of real- world texts. 
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